Finding Optimal Pairs of Patterns

نویسندگان

  • Hideo Bannai
  • Heikki Hyyrö
  • Ayumi Shinohara
  • Masayuki Takeda
  • Kenta Nakai
  • Satoru Miyano
چکیده

We consider the problem of finding the optimal pair of string patterns for discriminating between two sets of strings, i.e. finding the pair of patterns that is best with respect to some appropriate scoring function that gives higher scores to pattern pairs which occur more in the strings of one set, but less in the other. We present an O(N) time algorithm for finding the optimal pair of substring patterns, where N is the total length of the strings. The algorithm looks for all possible Boolean combination of the patterns, e.g. patterns of the form p ∧ ¬q, which indicates that the pattern pair is considered to match a given string s, if p occurs in s, AND q does NOT occur in s. The same algorithm can be applied to a variant of the problem where we are given a single set of sequences along with a numeric attribute assigned to each sequence, and the problem is to find the optimal pattern pair whose occurrence in the sequences is correlated with this numeric attribute. An efficient implementation based on suffix arrays is presented, and the algorithm is applied to several nucleotide sequence datasets of moderate size, combined with microarray gene expression data, aiming to find regulatory elements that cooperate, complement, or compete with each other in enhancing and/or silencing certain genomic functions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finding Optimal Pairs of Cooperative and Competing Patterns with Bounded Distance

We consider the problem of discovering the optimal pair of substring patterns with bounded distance α, from a given set S of strings. We study two kinds of pattern classes, one is in form p ∧α q that are interpreted as cooperative patterns within α distance, and the other is in form p ∧α ¬q representing competing patterns, with respect to S. We show an efficient algorithm to find the optimal pa...

متن کامل

Some Results about the Contractions and the Pendant Pairs of a Submodular System

Submodularity is an important  property of set functions with deep theoretical results  and various  applications. Submodular systems appear in many applicable area, for example machine learning, economics, computer vision, social science, game theory and combinatorial optimization.  Nowadays submodular functions optimization has been attracted by many researchers.  Pendant pairs of a symmetric...

متن کامل

Using a Treebank for Finding Opposites

We present an automatic method for extraction of pairs of opposites (e.g. hotcold, top-bottom, buy-sell) by means of dependency patterns that are learned from a 450 million word treebank containing texts from Dutch newspapers. Using small sets of seed pairs, we identify the best patterns for finding new pairs of opposites. Treebanks are useful for generating dependency patterns expressing relat...

متن کامل

Finding common structured patterns in linear graphs

A linear graph is a graph whose vertices are linearly ordered. This linear ordering allows pairs of disjoint edges to be either preceding (<), nesting (@) or crossing (G). Given a family of linear graphs, and a non-empty subset R ⊆ {<,@, G}, we are interested in the Maximum Common Structured Pattern (MCSP) problem: find a maximum size edgedisjoint graph, with edge pairs all comparable by one of...

متن کامل

یک نگرش ترکیب سطوح برای تخمین ماتریس مبدأ و مقصد در شبکه‌های بزرگ مقیاس

Transportation problems are usually considered in large-scale networks, where finding the optimal solution of these problems is so time-consuming and costly. Therefore, a useful method to solve the large-scale network problems is dividing them into some smaller sub-problems. In this paper, for the first time, the origin-destination (o-d) matrix estimation problem is considered through a mixed p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004